153 research outputs found

    CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation

    Full text link
    Applications in the field of augmented reality or robotics often require joint localisation and 6D pose estimation of multiple objects. However, most algorithms need one network per object class to be trained in order to provide the best results. Analysing all visible objects demands multiple inferences, which is memory and time-consuming. We present a new single-stage architecture called CASAPose that determines 2D-3D correspondences for pose estimation of multiple different objects in RGB images in one pass. It is fast and memory efficient, and achieves high accuracy for multiple objects by exploiting the output of a semantic segmentation decoder as control input to a keypoint recognition decoder via local class-adaptive normalisation. Our new differentiable regression of keypoint locations significantly contributes to a faster closing of the domain gap between real test and synthetic training data. We apply segmentation-aware convolutions and upsampling operations to increase the focus inside the object mask and to reduce mutual interference of occluding objects. For each inserted object, the network grows by only one output segmentation map and a negligible number of parameters. We outperform state-of-the-art approaches in challenging multi-object scenes with inter-object occlusion and synthetic training.Comment: BMVC 2022, camera-ready version (this submission includes the paper and supplementary material

    BTSeg: Barlow Twins Regularization for Domain Adaptation in Semantic Segmentation

    Full text link
    Semantic image segmentation is a critical component in many computer vision systems, such as autonomous driving. In such applications, adverse conditions (heavy rain, night time, snow, extreme lighting) on the one hand pose specific challenges, yet are typically underrepresented in the available datasets. Generating more training data is cumbersome and expensive, and the process itself is error-prone due to the inherent aleatoric uncertainty. To address this challenging problem, we propose BTSeg, which exploits image-level correspondences as weak supervision signal to learn a segmentation model that is agnostic to adverse conditions. To this end, our approach uses the Barlow twins loss from the field of unsupervised learning and treats images taken at the same location but under different adverse conditions as "augmentations" of the same unknown underlying base image. This allows the training of a segmentation model that is robust to appearance changes introduced by different adverse conditions. We evaluate our approach on ACDC and the new challenging ACG benchmark to demonstrate its robustness and generalization capabilities. Our approach performs favorably when compared to the current state-of-the-art methods, while also being simpler to implement and train. The code will be released upon acceptance

    Improved Hand-Tracking Framework with a Recovery Mechanism

    Get PDF
    Abstract−Hand-tracking is fundamental to translating sign language to a spoken language. Accurate and reliable sign language translation depends on effective and accurate hand-tracking. This paper proposes an improved hand-tracking framework that includes a tracking recovery algorithm optimising a previous framework to better handle occlusion. It integrates the tracking recovery algorithm to improve the discrimination between hands and the tracking of hands. The framework was evaluated on 30 South African Sign Language phrases that use: a single hand; both hands without occlusion; and both hands with occlusion. Ten individuals in constrained and unconstrained environments performed the gestures. Overall, the proposed framework achieved an average success rate of 91.8% compared to an average success rate of 81.1% using the previous framework. The results show an improved tracking accuracy across all signs in constrained and unconstrained environments

    Sequential Quantum Teleportation of Optical Coherent States

    Full text link
    We demonstrate a sequence of two quantum teleportations of optical coherent states, combining two high-fidelity teleporters for continuous variables. In our experiment, the individual teleportation fidelities are evaluated as F_1 = 0.70 \pm 0.02 and F_2 = 0.75 \pm 0.02, while the fidelity between the input and the sequentially teleported states is determined as F^{(2)} = 0.57 \pm 0.02. This still exceeds the optimal fidelity of one half for classical teleportation of arbitrary coherent states and almost attains the value of the first (unsequential) quantum teleportation experiment with optical coherent states.Comment: 5page, 4figure

    Hyperspectral Demosaicing of Snapshot Camera Images Using Deep Learning

    Full text link
    Spectral imaging technologies have rapidly evolved during the past decades. The recent development of single-camera-one-shot techniques for hyperspectral imaging allows multiple spectral bands to be captured simultaneously (3x3, 4x4 or 5x5 mosaic), opening up a wide range of applications. Examples include intraoperative imaging, agricultural field inspection and food quality assessment. To capture images across a wide spectrum range, i.e. to achieve high spectral resolution, the sensor design sacrifices spatial resolution. With increasing mosaic size, this effect becomes increasingly detrimental. Furthermore, demosaicing is challenging. Without incorporating edge, shape, and object information during interpolation, chromatic artifacts are likely to appear in the obtained images. Recent approaches use neural networks for demosaicing, enabling direct information extraction from image data. However, obtaining training data for these approaches poses a challenge as well. This work proposes a parallel neural network based demosaicing procedure trained on a new ground truth dataset captured in a controlled environment by a hyperspectral snapshot camera with a 4x4 mosaic pattern. The dataset is a combination of real captured scenes with images from publicly available data adapted to the 4x4 mosaic pattern. To obtain real world ground-truth data, we performed multiple camera captures with 1-pixel shifts in order to compose the entire data cube. Experiments show that the proposed network outperforms state-of-art networks.Comment: German Conference on Pattern Recognition (GCPR) 202
    corecore